Hadoop Based Link Prediction Performance Analysis
نویسندگان
چکیده
Link prediction is an important problem in social network analysis and has been applied in a variety of fields. Link prediction aims to estimate the likelihood of the existence of links between nodes by the known network structure. The time complexity of link prediction algorithms in huge-scale networks remains unexplored and unsolved, especially for sparse networks. In this project, we will explore how parallel computing speeds up link prediction in huge-scale networks. We implemented similarity based link prediction algorithms based on MapReduce, which have the time complexity of O(n) in sparse networks. We analyzed the performance of our algorithms on the Data Intensive Science Cluster at University of Notre Dame. Weevaluate the performance with different configurations, monitor the resource utilization of the distributed computation, and optimize accordingly. After analyzing the efficiency with different configurations, we present the fastest approach of performing parallelized link prediction, which is particularly suited for real-world big data.
منابع مشابه
Link Prediction using Network Embedding based on Global Similarity
Background: The link prediction issue is one of the most widely used problems in complex network analysis. Link prediction requires knowing the background of previous link connections and combining them with available information. The link prediction local approaches with node structure objectives are fast in case of speed but are not accurate enough. On the other hand, the global link predicti...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملLess is More: Temporal Fault Predictive Performance over Multiple Hadoop Releases
We investigate search based fault prediction over time based on 8 consecutive Hadoop versions, aiming to analyse the impact of chronology on fault prediction performance. Our results confound the assumption, implicit in previous work, that additional information from historical versions improves prediction; though G-mean tends to improve, Recall can be reduced.
متن کاملA Link Prediction Method Based on Learning Automata in Social Networks
Nowadays, online social networks are considered as one of the most important emerging phenomena of human societies. In these networks, prediction of link by relying on the knowledge existing of the interaction between network actors provides an estimation of the probability of creation of a new relationship in future. A wide range of applications can be found for link prediction such as electro...
متن کاملScalable Link Prediction in Online Social Networks
We describe a link prediction method based on a scalable community detection algorithm. It can be used to recommend new links in a real world social network with millions of users. Using a Hadoop cluster, we test our implementation on a Twitter user network containing 40 million users and 1.4 billion connections. We show that communities detected can then be used to recommend new users to follo...
متن کامل